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DETAILED ACTION 
Specification 

1 . The Examiner notes, without objection, the possibility of informalities in the abstract. The 
Applicant may wish to consider changes during normal review and revision of the disclosure. 

Numbers in the abstract referring to elements in the drawings lengthen the abstract and the 
reference is unclear when not accompanied by the appropriate figure. They may interfere with its 
purpose, which is to determine quickly from a cursory inspection the nature and gist of the 
technical disclosure. The language should be clear and concise. See 37 CFR § 1.72 and MPEP 
§ 608.01(b). 

Claim Rejections - 35 USC § 112 

2. The following is a quotation of the second paragraph of 35 U.S.C. 1 12: 

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the 
subject matter which the applicant regards as his invention. 

3. Claim 8 is rejected under 35 U.S.C. 1 12, second paragraph, as being indefinite for failing 
to particularly point out and distinctly claim the subject matter which applicant regards as the 
invention. 

4. Claim 8 is rejected under 35 U.S.C. 1 12, second paragraph, as being incomplete for 
omitting essential steps, such omission amounting to a gap between the steps. See 

MPEP § 2172.01. The omitted steps are: a converting step that provides antecedence for "the 
conversion" because this claim (claim 8) that the conversion "further" comprises. By that claim 
element, it is essential that a preceding converting step be recited. 
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To advance prosecution and evaluate prior art, the Examiner has examined claim 8 as 
though it depended from claim 7. By depending from claim 7, the preceding converting step 
appears in claim 6, and antecedence for the terminology "FFT" appears in claim 7. 

Claim Rejections - 35 USC § 102 

5. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the 
basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 122(b), by another filed in 
the United States before the invention by the applicant for patent or (2) a patent granted on an application for 
patent by another filed in the United States before the invention by the applicant for patent, except that an 
international application filed under the treaty defined in section 351(a) shall have the effects for purposes of this 
subsection of an application filed in the United States only if the international application designated the United 
States and was published under Article 21(2) of such treaty in the English language. 

Basu 

6. Claims 1, 4, 5, and 9 are rejected under 35 U.S.C. 102(e) as being anticipated by Basu [US 
Patent 6,594,629]. 

7. Regarding claim 14, Basu [at columns 19-20] describes an apparatus for extracting 
visemes from a speech signal by describing the content and functionality of the recited limitations 
recognizable as a whole to one versed in the art as the following terminology: 

means for receiving successive frames of digitized analog speech information for the 
speech signal [see Fig. 12, and its descriptions, especially at column 18, line 66-column 19, line . 
59, of the processor, memory, and software of the sampled audio (talking, speech) stream at 30 
frames/second]; 
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means for receiving the frames at a fixed rate, filtering each of the successive frames of 
digitized analog speech information to synchronously generate a time domain frame classification 
vectors from each successive frame at the fixed rate, and analyzing each of the vectors to 
synchronously generate a set corresponding to each frame [see Fig. 12, and its descriptions, 
especially at column 6, lines 4-55, of the process extracting 25-msec frames of sampled speech, 
extracting acoustic cepstral vectors of succeeding frames formed every 10 msec, and the 
probability module labeling the extracted vectors with phonemes]; 

generate a set of visemes [at column 30, lines 13-28, as subsequently use a phoneme to 
viseme mapping]. 

8. Claim 1 sets forth a method with limitations comprising the functionality associated with 
using the apparatus recited in claim 14. Because Basu describes those similar limitations as 
indicated there, these claims thus lack novelty and an inventive step accordingly. 

9. Regarding claim 4, Basu also describes: 

the set includes viseme identifiers [at column 13, lines 42-44 and 55-56, as visual speech 
feature vectors (visemes) labeled with phonemes]; 

the set includes confidence numbers [at column 13, lines 61-62, as combine with a 
confidence estimation that refers to a likelihood]; 

the confidence corresponds one to one [at column 13, lines 44-46, as each phoneme 
associated with visual speech feature vectors has a probability associated therewith]. 

10. Regarding claim 5, Basu describes the included claim elements by dependency as indicated 
elsewhere in this Office action. Basu also describes: 
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the set consists of an identity of the most likely one [at column 18, lines 5 and 53-54, as 
rescore the N-best list to recognize the highest likelihood]; 

it is a viseme [at column 30, lines 13-28, as subsequently use a phoneme to viseme 
mapping]. 

1 1 . Regarding claim 9, Basu also describes: 

a spatial classification [at column 18, lines 53-55, as rescore based on video]. 

Sutton 

12. Claims 1, 2, 9, 10, and 12-14 are rejected under 35 U.S.C. 102(e) as being anticipated by 
Sutton [US Patent 6,539,354]. 

13. Regarding claim 14, Sutton [at abstract] describes an apparatus for extracting visemes from 
a speech signal by describing the content and functionality of the recited limitations recognizable 
as a whole to one versed in the art as the following terminology: 

means for receiving, means for filtering, and means for analyzing [at column 23, lines 46- 
56, as computer code comprising instructions]; 

receiving successive frames of digitized analog speech information for the speech signal at 
a fixed rate, filtering each of the successive frames of digitized analog speech information to 
synchronously generate a time domain frame classification vector from each successive frame at 
the fixed rate, and analyzing each of the vectors to synchronously generate a set of visemes 
corresponding to each frame [at column 19, lines 1-16, as receive an input stream in frames of 
speech at 10 ms, compute a feature presentation for each frame, and produce viseme data for the 
frame]. 
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14. Claim 1 sets forth a method with limitations comprising the functionality associated with 
using the apparatus recited in claim 14. Because Sutton describes those similar limitations as 
indicated there, these claims thus lack novelty and an inventive step accordingly. 

15. Regarding claim 2, Sutton describes the included claim elements by dependency as 
indicated elsewhere in this Office action. Sutton also describes: 

with a latency less than 100 msec with reference to a successive frame [at column 19, 
lines 27-29, as the latency is around 80 ms]. 

1 6. Regarding claim 9, Sutton also describes: 

a spatial classification [at column 19, lines 33-39, as a dedicated viseme estimator trained 
on viseme deformability to go from speech input to visemes in a single neural network]. 

17. Regarding claim 10, Sutton also describes: 

by a neural network (or other) [at column 19, lines 10-1 1, as include a neural network]. 

18. Claim 12 sets forth limitations similar to claim 14. Sutton describes the limitations as 
indicated there, for a processor and software that provide the means. Sutton also describes 
additional limitations as follows: 

a processor and a memory that stores programmed instructions that control the processor 
[at column 15, lines 34-45, as a server storing all the software]. 
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19. Claim 13 sets forth limitations similar to claim 14. Sutton describes the limitations as 
indicated there, for a processor and software that provide the means. Sutton also describes 
additional limitations as follows: 

a processor and a memory that stores programmed instructions that control the processor 
[at column 15, lines 34-45, as a server storing all the software]; 

a display that displays an avatar that is formed [at column 22, lines 3-17, as a display 
through which a synthesis visual output according to the method has a 3D character for reading]; 

using the set of visemes [at column 17, lines 36-43, as viseme tracks are used to render an 
animation]. 

Claim Rejections - 35 USC §103 

20. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness 

rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

Basu and Thomson 

21. Claims 6-8 are rejected under 35 U.S.C. 103(a) as being unpatentable over Basu [US 
Patent 6,594,629] in view of David J. Thomson, "An Overview of Multiple- Window and 
Quadratic-Inverse Spectrum Estimation Methods," IEEE 1994, pp. VI 185-VI 194, already of 
record. 

22. Claim 6 includes the limitations of claim 1. Basu describes those limitations as indicated 
there. Basu also describes: 
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convert each frame to a spectral domain vector [at column 6, lines 20-21, as extract 
magnitudes of discrete Fourier transforms in a frame]; 

convert the spectral vectors using DCT [at column 8, lines 24-28, as transform the 
amplitude values, subsequently apply a discrete cosine transform]. 

However, Basu does not provide details of Fourier transformation to the spectral domain. 
In particular, Basu does not explicitly describe using prolate spheroid basis functions. 

convert to a spectral domain vector using N multi-taper discrete prolate spheroid sequence 
basis (MTDPSSB) functions [see Eq. (18) and its description of projecting to a frequency domain 
by N-l windows of a Slepian sequence (Discrete Prolate Spheroidal Wave Functions)]; 

they are factors of a Fredholm integral of the first kind [at page VI- 186, column 1, as the 
projection operation of spectrum estimation is a Fredholm integral of the first kind]; 

N is a positive integer [see Eq. (18) and its summation limits from 0 to N-l]. 

As indicated, Thomson shows that using N MTDPSSB functions was known to artisans at 
the time of invention. Since Thomson [at page VI- 188, column 1] also points out that MTDPSSB 
functions have the advantage of the best possible leakage properties for handling a dynamic range, 
it would have been obvious to one of ordinary skill in the art of converting data to the spectral 
domain at the time of invention to include the concepts described by Thomson at least using the 
MTDPSSB functions in Basu's conversion to the spectral domain because the MTDPSSB 
functions were known to have the advantage of the best possible leakage properties for handling a 
dynamic range. 



23. Regarding claim 7, Basu and Thomson describe and make obvious the included claim 
elements by dependency as indicated elsewhere in this Office action. Thomson also describes: 
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multiplying a successive frame by one of the MTDPSSB functions to generate N product 
sets of the frame [see Eq. (18) and its description, of multiplying (for windowing) the data x n by 

the N values of a Slepian sequence to generate the values of K windows]; 

performing a FFT of each produce set to generate N FFT sets of the frame [see Eq. (18) 
and its description, of the exp(-i27rf) and sum, for the FFT used for coefficient computation]; 

adding (change adding to combining because the addition is done to magnitude spectrums 
rather than separately to the real and imaginary components) together the N FFT sets of the frame 
to generate a summed FFT set of the frame [see page VI- 187, column 1, and the example for a 
Simple Spectrum Estimate, of summing (combining) the square of the absolute value (magnitude) 
of K coefficients of the expansion coefficients from Fourier transforming]. 

24. Regarding claim 8, if the Examiner's assumption about dependency is correct, Basu and 
Thomson describe and make obvious the included claim elements by dependency as indicated 
elsewhere in this Office action. Thomson also describes: 

scaling the summed FFT set of the successive frame(s) [see page VI- 187, column 1, and 
the example for a Simple Spectrum Estimate, dividing by K the total of summing K coefficients of 
the expansion coefficients from Fourier transforming]. 



Sutton and Peterson 

25. Claims 3 and 1 1 are rejected under 35 U.S.C. 103(a) as being unpatentable over Sutton 
[US Patent 6,539,354] in view of Peterson et al. [US Patent 5,067,095]. 
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26. Claim 3 includes the limitations of claim 2. Sutton describes those limitations as indicated 
there. Sutton [at columns 18-19] also suggests lower latency hold advantages; however, Sutton 
does not explicitly describe latency less than 10 msec. 

Like Sutton , Peterson [at column 1, lines 56-59] describes a neural network for speech 
recognition, and Peterson also describes: 

latency less than 10 milliseconds [at column 11, lines 27-46, as typically 20 elements and 
delays of 10 microseconds (20 x .01 ms = .2 ms) to provide an output signal from the input 
signal]. 

As indicated, Peterson shows that latency less than 10 milliseconds was known to artisans 
at the time of invention. Since Peterson [at column 1, lines 44-55] also points out that neural 
network processing has the inherent advantage of offering real time execution, it would have been 
obvious to one of ordinary skill in the art of real time speech recognition at the time of invention 
to include the concepts described by Peterson at least latency less than 10 milliseconds by 
adjusting Sutton 's neural network to a latency less than 10 milliseconds because that would 
provide faster processing within whatever certain degree of error can be tolerated. 

27. Claim 1 1 includes the limitations of claim 9. Sutton describes those limitations as 
indicated there. Although Sutton describes speech recognition and viseme classification using 
neural networks, Sutton does not describe detail of a neural network. In particular, Sutton does 
not explicitly describe a feed-forward, memory-less, perceptron type neural network. 

Like Sutton , Peterson [at column 1, lines 56-59] describes a neural network for speech 
recognition, and Peterson also describes: 
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a neural network [see Figs. 4a and 4b and their descriptions especially at column 6, lines 
16-68, of the SPANN (sequence processing artificial neural network)]; 

feed- forward type [see Figs. 4a and 4b and their descriptions especially at column 6, lines 
1 6-68, of signals-applied-at-the-inputs-processed-and-provided-through-the-outputs] ; 

memory-less type [see Figs. 4a and 4b and their descriptions especially at column 6, lines 
16-68, of signals-applied-processed-and-output]; 

perceptron type [see Figs. 4a and 4b and their descriptions especially at column 6, lines 16- 
68, of neurons]. 

As indicated, Peterson shows that a feed- forward, memory-less, perceptron type neural 
network was known to artisans at the time of invention. The system by Sutton requires a neural 
network, but merely any neural network from mature technologies. Sutton has not disclosed a 
preferred approach to those operations according to a design criterion or solution to any stated 
problem. Since it appears that the use of any neural network that is known to artisans would 
perform to provide Sutton 's requirement of low latency, it would have been obvious to one of 
ordinary skill in the art of real time speech recognition at the time of invention to include the 
concepts described by Peterson at least a feed-forward, memory-less, perceptron type neural 
network according to Sutton 's suggestion for low latency because Peterson [at column 1, lines 44- 
55] indicates that would provide faster processing within whatever certain degree of error can be 
tolerated. 
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Conclusion 

28. The following references here made of record are considered pertinent to applicant's 
disclosure: 

Margaliot et al. [US Patent Application Publication 2004/0107106] describes identifying 

phonemes and visemes of speech and rapidly generating a visual persona synchronized to 
speech reproduction. 

Lin et al. [US Patent Application Publication 2004/0120554] describes generating and using a 
lexicon for recognizing visemes by features extracted from an utterance at an adjustable 
latency. 

29. Any response to this action should be mailed to: 

Mail Stop Amendment 

Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 

or faxed to: 

(703) 872-9306, (for formal communications intended for entry) 

Or: 

(703) 872-9306, (for informal or draft communications, and please label 
"PROPOSED" or "DRAFT") 

Patent Correspondence delivered by hand or delivery services, other than the USPS, should 
be addressed as follows and brought to U.S. Patent and Trademark Office, Customer 
Service Window, Mail Stop Amendment, Randolph Building, 401 Dulany Street, 
Alexandria, VA 22314 
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30. Any inquiry concerning this communication or earlier communications from the examiner 
should be directed to Donald L. Storm, of Art Unit 2654, whose telephone number is 
(571) 272-7614. The examiner can normally be reached on weekdays between 8:00 AM and 4:30 
PM Eastern Time. If attempts to reach the examiner by telephone are unsuccessful, the 
examiner's supervisor, Richemond Dorvil can be reached on (571) 272-7602. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Inquiries regarding the status of submissions 
relating to an application or questions on the Private PAIR system should be directed to the 
Electronic Business Center (EBC) at 866-217-9197 (toll-free) or 703-305-3028 between the hours 
of 6 a.m. and midnight Monday through Friday EST, or by e-mail at: ebc@uspto.gov. For general 
information about the PAIR system, see http://pair-direct.uspto.gov. 



Donald L. Storm 
Patent Examiner 

May 19,2005 Art Unit 2654 



